home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
HamCall (October 1991)
/
HamCall (Whitehall Publishing)(1991).bin
/
bcast
/
audvid
/
dolbys.txt
< prev
next >
Wrap
Text File
|
1990-10-14
|
25KB
|
461 lines
Spectral Recording Process
Technical Aspects
Dolby Laboratories, Inc.
San Francisco and London
386-6825
By Ray M. Dolby
February 24, 1986
Brief Outline
-------------
The spectral recording process includes some layout and operating
characteristics in common with those of the A-type, B-type, and C-type
noise reduction systems. Regarding general principles, reference should
be made to the technical papers on these systems; the C-type paper is
particularly relevant.
Relevant to the figure called "Basic Block Diagram - Spectral
Recording Encoder and Decoder", a main signal path is primarily
responsible for conveying high level signals. The side chain signals are
additively combined with the main signal in the encoding mode and
subtractively in the decoding mode, whereby an overly complementary
action is obtained. In this diagram a dedicated encoder and decoder are
shown laid out in a symmetrical fashion; other decoder configurations
are possible in switchable circuits.
The SR stage layout resembles that of the C-type system, except that
three levels of action staggering are used: high-level, mid-level, and
low-level. There are various advantages arising from the use of multi-
level stages, including action compounding for good spectral
discrimination, accuracy and reproducibility, low distortion, and low
overshoot. The thresholds used are about -30dB, -48dB, and -62dB below
reference level (20dB below SR peak signal level). For the high-level
and mid-level stages both high frequency and low frequency circuits are
used, with a crossover of 800 Hz. The low-level stage is high frequency
only, with an 800 Hz high pass characteristic. There is a significant
overlap region between the high and low frequency stages, the high
frequency stages extending their effects up to about 3 kHz. This overlap
contributes to the spectral tracking abilities of the system.
Each stage above has a low-level gain of 8dB, whereby a total dynamic
effect of 16dB is obtained at low frequencies and 24dB is obtained at
high frequencies. A further dynamic action of about 1-2dB takes place
above the reference level.
The spectral skewing network has the same purpose and function as in
the C-type system, except that a spectral skewing action is provided at
low frequencies as well. The high frequency network is a low pass filter
with an attenuation characteristic similar to that of a 12kHz two-pole
Butterworth filter. The low frequency network is a 40 Hz high pass
filter, connected in series with the high-frequency network, also with a
two-pole Butterworth-like characteristic. The spectral skewing networks
de-sensitize the stage circuits to the influence of signal components at
the extreme ends of the audio frequency band. This effect is
particularly helpful if the tape recorder has an uncertain frequency
response in these regions. The filters are of course also important in
attenuating subsonic and supersonic interferences of all kinds. The
spectral skewing action is compensated in the decoder, resulting in an
overall flat response.
Both high frequency and low frequency antisaturation networks are
provided in the main signal path. The networks are operative above about
4 kHz and below about 100 Hz. There is an effective compounding of the
antisaturation effects produced by the antisaturation networks and the
spectral skewing networks. The overall result is an antisaturation
effect of 2-3 dB at 5 kHz, 6 dB at 10 kHz and 10 dB at 25 Hz and 15 kHz.
Least Treatment Principle
-------------------------
A design philosophy used in the development of the new system is that
the best treatment of the signal is the least treatment. The design goal
for the encoder is to provide a predetermined, fixed gain for all sub-
threshold signal components. If a large signal component appears at a
particular frequency or frequencies, then the gain should be reduced at
those frequencies only in accordance with a predetermined compression
law so that it is possible to restore the signal during decoding. In
other words, the compressor tries to keep all low level signal
components fully boosted at all times; when the boosting must be cut
back at a particular frequency the effect should not be extended to low-
level signal components at other frequencies.
The audible effect of this type of compression is that the signal
appears to be enhanced and brighter but without any apparent dynamic
compression effects (the ear detects dynamic action primarily by the
effect of a gain change due to a signal component at one frequency on a
signal component at some other frequency). If the ear cannot detect
dynamic effects in the compressed signal then a) it is unlikely that
noise modulation effects will be evident in the decoded signal, and b)
it is unlikely that signal modulation effects will be evident in the
decoded signal if there should be a gain or frequency response error in
the recording or transmission channel.
In the spectral recording process two new methods are used that
greatly reduce the circuitry required to achieve the design goal of a
full spectrally responsive system. In particular, both fixed and sliding
bands are used in a unique combination, called action substitution,
that draws on the best features of both types of circuits. A further
technique, called modulation control, greatly improves the performance
of both fixed and sliding bands in resisting any modulation of signal
components unless necessary.
The use of the new methods reduces the basic encoder to two frequency
bands only (high frequency and low frequency), each with a fixed band
and a sliding band. When the three-level action staggering layout is
taken into account, five fixed bands and five sliding bands are
employed.
Action Substitution
-------------------
Both fixed band and sliding band dynamic actions are used in each of
the five stages. In any particular stage, fixed band operation is used
whenever it provides best performance; sliding band operation is
substituted whenever it has an advantage. In this way the best features
of both methods are obtained, without the attendant disadvantages of
each.
The substitution is effective on a continuous and frequency by
frequency basis. For example, the output from a given high frequency
stage will typically be from the fixed band for frequencies up to the
dominant signal component and from the sliding band above that
frequency.
The advantages of fixed band circuits arise from the fact that all
signal frequencies within the band are treated equally, in contrast with
sliding band action. Thus the appearance of a signal component actuating
the compressor results in a loss of noise reduction effect that
manifests itself in a uniform manner throughout the band; the loss is
not concentrated in any particular frequency region, as in sliding band
circuits.
In contrast, the advantages of sliding band compression and expansion
circuits derive from the fact that all signal frequencies are not
treated equally. In particular, compression, expansion, and noise
reduction action are well maintained above the frequency of the dominant
signal component in high frequency circuits, and below the frequency of
the dominant signal component in low frequency circuits; this action
maintenance effect, except on a one to one basis, is absent in fixed
band circuits.
The action substitution technique provides the advantages of fixed and
sliding band circuits while avoiding their disadvantages. In other
words, there is a significantly improved adherence to the ideal of least
signal treatment; the signal more closely approaches fully boosted
conditions in the encoding mode, with a consequently improved noise
reduction effect in the decoding mode.
Modulation Control
------------------
In the A-type, B-type, and C-type systems the signal from the side
chain is highly limited under high-level signal conditions. This high
degree of limiting, beginning at a low-level threshold, is responsible
for the low distortion, low overshoot, and low modulation distortion
which characterize these systems.
A closer examination shows that it is unnecessary to utilize such a
low threshold and such a strong limiting characteristic under certain
signal conditions. In particular, whenever the side chain signal departs
from an in-phase condition with respect to the main path signal, then
the threshold can be raised. Furthermore, after an appropriate degree of
limiting has taken place at a given frequency (in order to create the
desired overall compression law), then it is unnecessary to continue the
limiting as the signal level rises. Rather the level of the side chain
signal can be allowed to rise as the input signal rises, stabilizing at
some significant fraction of the main path signal level.
In the fixed band portions of the circuit the above arrangement
results in conventional performance in the pass-band (in-phase)
frequency region. However, in the stop-band region the modulation
control arrangement causes the limiting threshold to rise and the degree
of limiting to be reduced. In this way large signals in the stop-band do
not cause signal modulation in the pass-band and consequently an
impairment of the noise reduction effect achieved during decoding.
Similar considerations apply in the SR sliding band circuits. Above
the threshold at a particular frequency the variable filter slides to
the turnover frequency needed to create the overall (main path plus side
chain signal) compression law. As the input level rises, and once an
overall gain of unity is obtained, there is no reason for further
sliding of the variable filter. At this point the modulation control
arrangement counteracts further sliding of the variable filter; as with
the fixed band circuits, this prevents unnecessary modulation of the
signal with consequent impairment of the noise reduction effect.
The modulation control aspects of the SR process result in a
compression action which is remarkably free of noticeable signal related
modulation effects. Working together with action substitution,
modulation control contributes to the goal of least treatment, in
providing a highly boosted, audibly stable signal.
Overshoot Suppression
---------------------
A highly flexible overshoot suppression system is used; a multiplicity
of overshoot suppression circuits operate directly upon the control
signals of the various stages. The SR process employs overshoot
suppression thresholds that are significantly higher than the steady
state thresholds; the low level overshoot suppression levels are set at
about 10 dB above the relevant steady state thresholds. The overshoot
suppression effects are then phased in gradually as a function of
increasing impulse level. The net result is that for most musical
signals the overshoot suppressers rarely operate; the compressors are
controlled by well smoothed, double integrated dc control signals. When
the suppressers do operate, the effect is so controlled that modulation
distortion is minimal. Under relatively steady state, but nonetheless
changing, signal conditions the overshoot suppression effects are
gradually phased out with increasing signal levels; this action further
ensures low overall modulation distortion from the system. The
thresholds are controlled by the same modulation control circuits used
to control the steady state characteristics; thus there is a tracking
action between the transient and steady state behavior.
In the low frequency circuits special overshoot suppressers are used
for relatively slowly changing low frequency signals; these are very
gentle, slow acting circuits which reduce low frequency transient
distortion.
Operating Characteristics
-------------------------
Quiescent (sub-threshold) noise reduction effect:
------------------------------------------------
The spectral recording process has been designed in a way that takes
advantage of the characteristics of hearing and of existing recording
processes. There is less of a problem in the generation and perception
of noises at moderately low frequencies (e.g. 200 Hz) than at moderately
high frequencies (e.g. 3 kHz); therefore two low frequency stages are
employed, but three high frequency stages are used. A noise reduction
effect of about 16 dB is obtained at low frequencies; at high
frequencies the effect is about 24 dB.
At very low and very high frequencies less noise reduction is needed
(below 50 Hz and above 10 kHz). Strong spectral skewing actions can
therefore be used in these regions, resulting in high and low frequency
sliding band actions which are more accurate in the event that the tape
recorder has response irregularities in these regions. Additionally, the
spectral skewing networks provide for good immunity to high and low
frequency interference (supersonic audio components, tape recorder bias;
subsonic noise components, particularly room rumble from traffic and air
conditioning).
The overall shape of the low-level noise reduction characteristic
resembles the inverse of the low level Fletcher-Munson and Robinson-
Dadson curves, as well as the consequently derived CCIR noise weighting
curve.
The amount of noise reduction provided by the spectral recording
process is enough to yield an overall usable dynamic range with 15 ips
tape that comfortably equals or exceeds that of 16 bit PCM. For example,
if signal gains are adjusted such that the audible noise levels of the
two recording processes are similar, at high levels SR will typically
have several dB of further soft clipping headroom available. At very low
signal levels, with increased monitor gain, the audible signal quality
of SR will be superior because of its inherently linear analogue
transfer characteristic.
Dynamic action for steady-state dominant signals:
------------------------------------------------
Low Frequencies: dynamic action occurs in the range -48 dB to -5 dB
(with respect to reference level); i.e. there is no action in the lower
35-40 dB of the total dynamic range (starting from the system noise
level) but full boosting or attenuation, or the top 25 dB of the total
dynamic range (ending with the clipping level): there is a linear
dynamic characteristic in these two regions.
High Frequencies: dynamic action occurs in the range -62 dB to -5 dB
(with respect to reference level); i.e. there is no action in the lower
20-25 dB boosting or attenuation, or the top 25 dB of the total dynamic
range (ending with the clipping level): there is a linear dynamic
characteristic in these two regions.
In the dynamic action ranges the effects of the multi-level stages are
joined together to create a compression ration of about 2:1.
Dynamic action for steady-state non-dominant signal components:
--------------------------------------------------------------
Non-dominant signals are boosted or attenuated over and above that of
the dominant signal towards the two spectrum ends by high and low
frequency sliding band actions. If there are two dominant signals, a
fixed band compression or expansion effect prevails for the non-dominant
signal components (therefore no mid-band modulation effect).
Thus, non-dominant signal components are boosted or attenuated by an
amount at least equal to that of the dominant signal. The boosting or
attenuation of the non-dominant signals is maintained towards the
spectrum ends even though the level of the dominant is relatively high
(e.g. in the range -10 dB to +20 dB, with respect to reference level).
This boosting or attenuation action spectrally tracks the dominant
signal frequency or frequencies.
To provide a steep boosting or attenuation effect away from the
frequency of the dominant signal component, the SR circuit employs the
steepness enhancing effect that arises from the use of cascaded stages.
The low frequencies have two stages of steepness compounding. At high
frequencies the use of three stages improves the effect even further.
The overall result is that the encoder circuit tends toward keeping
all low level signal components boosted at all times. Only those
components above the threshold are subject to a reduction of boosting.
The advantages of this type of characteristic are:
a) A powerful noise reduction effect in the presence of signals, much
more so than with any previous system. This property is responsible for
the high purity of signals from the SR system.
b) Freedom from the mid-band modulation effect. The system is
essentially immune to exaggerated frequency response errors due to
frequency response problems in the recorder, including 30 ips head-bump
signal-pumping effects.
The audible encoding effect of the system is to create a dense, bright
sounding signal, but with little or no apparent dynamic action.
harmonics, overtones, and small scale components of the sound, including
noise, are all enhanced.
The decoding effect of the system, in response to an encoded signal,
is to provide a wholly restored signal with respect to frequency and
phase response, including all transient effects. Regarding noises
introduced into the encoded signal, the decoding property of the system
is to create a very clean sounding replica of the input. The decoder
reduces the tape bias noise and modulation noise, spectrally and
temporally, in a way that previous systems have not been able to do.
Moreover, the low frequency noise reduction effect of the system is
quite useful in dealing with high frequency intermodulation components.
For example, if two or more simultaneous high frequency tones are
applied, at a level high enough to create audible intermodulation
distortion, the system will significantly reduce the lower frequency
distortion components produced.
The decoder is also useful in reducing harmonic distortion produced by
the recording medium. Steady-state third harmonic distortion is
typically reduced to less than one-half; fifth harmonic distortion is
reduced to less than one-quarter. Higher order harmonics are even
further reduced. Thus, especially if the medium has a hard clipping
characteristic, the audible cleanliness of the signal at high recording
levels is significantly improved.
Antisaturation aspects
----------------------
The antisaturation effects of SR are maintained down to fairly low
levels (about 15 dB below reference level). The result is to produce
recordings that are notably freer of intermodulation distortion than
would otherwise be the case. This is particularly true in live recording
situations, in which significant signal components may exist.
At low frequencies the LF antisaturation characteristic has a double
significance. First, low frequency signal components are reduced in
amplitude on the recording, thereby permitting higher signal levels at
higher frequencies (in optical recording, the spaces used on the sound
track add directly, in contrast with magnetic recording, which, to some
extent allows the high frequencies to be superimposed on the low
frequency components). Second, the antisaturation characteristic carries
on strongly down to very low frequencies (40 Hz, 20 Hz). This allows the
recording and reproduction of low frequency special effects with ease
(10 dB of antisaturation at 25 Hz).
The existence of the new system will no doubt prompt a re-evaluation
of recording and transmission formats which have been thought to be
inadequate or marginal for professional use. In radio broadcasting, for
example, the new system might result in greater use of 7 1/5 ips and
various cassette and cartridge formats. Conventional landlines will also
be worthy of investigation; the new system is substantially more
effective than previous ones in dealing with landline types of noises
(whistles, hums, buzzes, crosstalk).
Calibration Arrangements
------------------------
The spectral recording calibration procedures and circuits are
conceptually similar to those of the A-type noise reduction system. That
is, signal levels in the decoder circuit ideally should match those in
the encoder circuit (however, the SR system is more tolerant of gain and
frequency response errors than A-type). For tape interchange
standardization it is also preferable if, at least within a given
organization, the "Dolby Level" of the encoder and decoder corresponds
to a known and fixed flux level. Whether or not a standardized flux is
used for Dolby Level, the matching of the decoder to the encoder is
accomplished by a calibration signal generated in the encoder and
recorded on the tape; this allows the tape replay gain to be set
correctly, using the meter in the decoder unit.
Most problems in the studio use of A-type noise reduction, and indeed
analogue recording in general, can be traced to incorrect level settings
and/or frequency response errors in the recorder. This may be because
checking these factors is a time consuming and boring process. A faster
and more interesting method of accomplishing these checks would be more
likely to produce reliable and consistent results.
Commercial embodiments of the SR process include a pink noise
generator which is used for both level and frequency response
calibration, instead of a single-tone sine wave oscillator. For
identification, the pink noise is interrupted with 20 ms "nicks" every 2
seconds; the resulting signal is called "Dolby Noise". During recording
this signal is fed to the tape at a level of 15 dB below reference level
(Dolby Level), a level low enough not to cause saturation problems with
low speed tape recording or highly equalized transmission channels.
During playback the tape signal is automatically alternated with
internally generated reference pink noise (uninterrupted) in 4 second
segments (8 second total cycle time) and passed to the monitor output.
An audible comparison can thus be made between the reference pink
noise and the Dolby Noise coming from the tape. This mode of operation
is called "Auto Compare". Any discrepancies in level and/or spectral
balance are immediately noticeable and can be corrected or at least
taken note of. If desired, the signal can also be fed to a real time
analyzer. The 20 ms nicks in the signal do not affect the analyzer
display because of the peak hold circuits employed in the analyzer.
In using the new calibration method it is important to be able to tell
when the 4 second tape segments are being passed to the monitor and when
the signal heard is from the reference pink noise generator.
Differentiation of the tape segments are 4 seconds of continuous pink
noise, and the tape segments begin with a nick in the middle, and end
with a nick; this time sequence is easily identified with a little
practice. Second, lights on the SR module identify the signals. A green
light marks the tape signal, a red light the reference signal. An output
is available for actuating externally mounted lights, such as near the
loudspeakers.
A further yellow light on the module shows that the module is in the
Dolby Noise mode, which is actuated by pressing the "Dolby Tone" button
on the frame.
During calibration the meter circuit in the frame is fed by a band
limited (200 Hz - 4 kHz) Auto Compare signal from the module. Band
limiting reduces the effect on the meter reading of frequency response
errors in the tape recorder and also improves the stability of the
reading (less bouncing).
The calibration facility built into the new system will give the
recording engineer and producer a control and monitoring of the
recording process that was previously unavailable. At any time an Auto
Compare check of the recorder can be made. The result can be heard
immediately and conclusions drawn about whether any recorder adjustments
might be necessary.
With tape and signal interchanges it will be possible to tell quickly
whether there is any error or misunderstanding about levels,
equalization, azimuth and the like. If the original recording of the
Dolby Noise stays with the tape, quality of the ultimate playback, even
after copying, will be retained. Thus the Auto Compare function serves
to ensure that the recorder and spectral recording process will provide
on a routine basis the high signal quality and reliability of which they
are capable.
Conclusion
----------
Brief details of the new spectral recording process have been given. A
full technical account of the system was presented at the Fall 1986 AES
Convention in Los Angeles. Reprints are available from the AES.